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[57] ABSTRACT 

A method and apparatus for forming a visual index of scenes 
in a video image which has been or is being recorded in a 
computer readable memory. Aselected number of keyframes 
are derived from the recorded image, each being represen- 
tative of a respective scene therein. The keyframes are then 
ordered into a selected number of levels of detail of the 
scenes represented thereby, each level including a predeter- 
mined number of keyframes, each subsequent level includ- 
ing keyframes of greater detail than those in a preceding 
level. A header file is then formed which is descriptive of the 
ordered set of keyframes, and the header file is stored 
together with the ordered set of keyframes in the computer 
readable memory. A user can thereby identify and obtain 
optimized retrieval in accordance with his preferences of 
particular segments of the video image from a relatively 
slow memory device. The method and apparatus are equally 
applicable to formation of an indexed order of binary large 
objects ("blobs") in a set of multimedia documents in 
accordance with a user's preferences. 

9 Claims, 7 Drawing Sheets 
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APPARATUS AND METHOD FOR mized for transaction processing such as editing data (i.e., 

OPTIMIZING KEYFRAME AND BLOB inserting, updating and deleting data) in a database of the 

RETRIEVAL AND STORAGE system. Query optimization is available also; however, 

benchmarks of database systems concentrate on changing 

BACKGROUND OF THE INVENTION 5 data ^ fast ^ possiblc ^ paraUe! requests . 

1. Field of the Invention In databases, order of retrieval is not known in advance 
The present invention relates to an apparatus and method since database management systems typically have no 

for storing in a computer readable medium keyframes of a knowledge of stored data content or what query will be 

video image or excerpts from a document, and more par- requested. 

ticularly to providing for storage thereof so as to optimize 10 In a Digital Compact Cassette (DCC) format, an index 

retrieval from a relatively slow memory device. system describes which tracks are on a specific tape; 

2. Description of the Related Art however, priority between different tracks does not exist; 
In a video indexing process, keyframes that visually therefore, optimization of retrieval of the content is not 

describe a video image may be extracted from the video by 15 possible. 

cut detection and keyframe filtering such as described in For a Web page or another similar type multimedia 

pending patent applications "Significant Scene Detection document, information is provided to a user based on a 

and Frame Filtering for a Visual Indexing System", U.S. Ser. format prespecified by the provider, not on a user-stored 

No. 08/867,140 pending and "Video Indexing System", U.S. preference. 

Ser. No. 08/867,145 pending, having amongst their inven- ^ 

tors the inventors of the present invention, to create an index. SUMMARY OF THE INVENTION 

In video cut detection and keyframe filtering, keyframes are ^ object of me invention is to provide a system which 

selected from a large number of possible frames (30 frames optimizes access to an index of multimedia documents. For 

per second of video, typically). Even after the keyframe ^at purpose, the invention groups keyframes in nodes in 

filtering process, the number of keyframes is considerable, ^ 51obs and str u C tures and stores them in a hierarchical 

approximately 250 keyframes per video tape. Typically then, manner. The hierarchy includes nodes which are parent or 

the size of an index is approximately 1 MB, if the keyframes child nodes ^ blobs based on preS pecified user prefer- 

are scaled down to 160x120 resolution and compressed into ences ^ number 0 f keyframes (images) in a node and the 

JPEG format. Without scaling and compression, the size of number of child nodes under a parent node are arbitrary> 
the index could be 50 MB or more. At this size, retrieval of 

keyframes could take considerable amount of time, espe- BRIEF DESCRIPTION OF THE DRAWINGS 

cially if the retrieval is performed over slow channels such „ „ .„ , 

as high latency networks (e.g., Internet, Intranet, etc.) or 1 ^strates a sample visual mdex hierarchy; 

linear tape mediums such as VHS tape. FIGS. 2A-2B illustrates a visual hierarchy for the present 

Similarly, for web sites, web pages or multimedia or 35 mvenuon ; 

hypermedia documents including blobs are presented. A FIG. 3 illustrates a sample header file; 

multimedia document or web page containing video (or FIGS. 4A-4B illustrate hierarchies with group headers; 

images) can require a large amount of memory which may p IG 5 illustrates a linear representation of the hierarchy; 

be on the order of tens of megabytes. Time required to £K .„ . , , . , . . - ,, 

, . , , 1 , . j . * , r*_ l FIGS. 6A-0E illustrate detailed representations of the 

download such a multimedia document or soltware may be 40 n i erarcn . 

considerable with a typical 28.8 kb/sec modem. ^' 

A website may include a large number of possible web FIGS - m s y stcms of thc P rcsent Mention, 

pages, multimedia documents and links which may be DESCRIPTION OF PREFERRED 

unwieldy for a user to navigate. Each multimedia document EMBODIMENTS 

or web page may include blobs. The blobs may include 45 

audio, video, text, hypertext links or links to other docu- The present invention includes nodes of keyframes or 

ments. A website retrieval of pages or multimedia docu- blobs and links in a hierarchy as illustrated in FIG. 1. 

ments and their respective blobs, especially those a user has Although keyframes are referred to in the description, the 

an interest in, may take a considerable amount of time as description is also applicable to blobs, 

blobs typically are stored in temporal or static hierarchies. A 50 In the present invention, as shown in FIG. 2A, six parent 

multimedia document may be created which provides a user keyframes are in a parent node and a maximum of thirty-six 

with web pages having audio, video, text and links based on child nodes (six child keyframes per parent keyframe) are 

user preferences or other prespecified criteria. under a parent node. Clearly, one skilled in the art could 

To optimize retrieval of the keyframes or blobs in a user modify the number of nodes or number of child nodes under 

friendly manner, an index or multimedia document is ere- 55 a parent node. 

ated using a hierarchical structure representation. Temporal For reference, the top level of nodes (in this example, one 

hierarchies have been described in the literature, such as node having six keyframes) is Level A, with keyframes 

Ueda, Hirotada and Takafumi Miyatake. "Automatic Scene labeled 1, 2 ... x. The second level of nodes is Level B, and 

Separation and Tree Structure GUI for Video Editing", The includes six nodes. The keyframes are labeled 11, 12, 13 . . . 

Fourth ACM International Multimedia Conference, Multi- eo 16, 21, 22, 23 . . . 26, 31, 32, 33 ... 36 ... ; and the 

media (Nov. 18-22, 1996): 405-406, as a conceptual rep- keyframes on the third level, Level C, are labeled 111, 

resentation of keyframes. The present invention creates a 112 . 116, 121, 122 . . . 126 . . . The keyframes are 

linear index structure or linear multimedia document struc- numbered, for easy reference and illustration only, to indi- 

ture out of the temporal hierarchy, allowing for optimized cate their level and order in the level. The various levels of 

retrieval. Currently, storage in databases is typically not 65 the hierarchy correspond to the level of detail shown with 

optimized for retrieval, but instead, optimized for transac- respect to the underlying video, in this example, with 

tion processing. For example, database systems are opti- decreasing representation of the video as a whole. For 
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example, those keyframes on Level A are the six most text, and/or links) or have any relation to a user's preference, 

representative frames of the video while those keyframes on An analysis can be performed on the information based on 

Level B are the next most representative and on Level C, the a prespecified user profile and the information can be 

next representative. reordered into a temporal hierarchy by "flattening" the 

An example of the hierarchy presented in FIG. 2A is for 5 reordered hierarchy into a user file which is embodied on a 

a video which is six hours long and partitioned into x time computer-readable medium. 

parts. In this example, the top nodes on Level A (only one FIG. 3 illustrates a sample header file. A header may 

node is shown), each have six parent keyframes that together include such information as video tape ID, title of the video, 

represent the entire video and each parent keyframe has six category of the video, recording date, index date, tape 

child keyframes. Each of the six parent keyframes may 10 length, version of the visual index, resolution of the images, 

correspond to one hour of the entire video, thus partitioning number of levels, number of child nodes, and number of key 

the video in equal blocks of hours, or may correspond to frames in the visual index. This information codes frame 

periods of time based on video program structure. numbers and information and based on the coded frame 

The keyframes on Level B provide more details about the numbers and information, can calculate from which position 

portion of the video tape represented by the parent keyframe. on the storage, i.e., video tape, CD, a VCR should be 

Specifically, keyframes 11, 12, 13 ... 16 under keyframe 1 positioned. It may be desired to limit the information stored 

provide more detail about the first block of time which in the header file to prevent data corruption and to reduce 

keyframe 1 represents. Every keyframe represents a portion storage. Additionally, the header file could be stored in 

of video. For this example, six keyframes are selected to several places on the storage medium to prevent data cor- 

represent the entire video as parent keyframes (Level A), ruption. 

thirty-six keyframes are selected to represent the entire 20 In this example (FIGS. 4A and 4B), a visual index 

video as child keyframes (Level B) and two hundred and contains a header file (video header) 410 or 416 and the 

sixteen keyframes are selected to represent the entire video keyframes or keyframe images 412 & 414 or 418 and 420, 

as grandchild keyframes (Level C). Each next level of nodes The visual index of, in this example, 216 keyframe images 

contains keyframes which are representative of each portion has a header file of 4 KB while the keyframe images take 

of video of the relevant parent node. 25 844 KB. Although in the present example, one header file is 

For example, node 1 has all the details of the first portion used which may be specific or general to the video, level or 

of the video as represented by six parent keyframes (1-6). group headers (422 and 424 or 426 and 428) could be added 

On the next level, keyframe 1, for example, is further to describe specific levels of nodes as shown in FIGS. 4A 

detailed by six child keyframes 11-16. On the next level, and 4B, as could other types of headers, 

keyframe 11, for example, is further detailed by six grand- FIG. 4A illustrates a hierarchical level wise keyframe 

child keyframes 111-116. clustering while FIG. 4B illustrates a parent-child wise 

The hierarchy created does not necessarily represent a clustering of keyframes for storage, 

balanced tree. Additionally, the keyframe 1 may be the same FIG. 5 illustrates a visual index structure which flattens 

as keyframe 11 and keyframe 111. 3S and linearly represents the hierarchy. In an archiving 

The temporal hierarchy can be stored on a memory device process, this structure is created on a temporary device such 

such as a disk or tape using many different structures. In the ^ a disk or other computer-readable medium and written in 

present invention, the hierarchy is "flattened" for storage in its entirety to a linear medium, such as a tape or over a 

a computer-readable medium by describing the structure in network. In the present invention, the header file is the first 

a header file and by grouping the keyframes in independent ^ file to allow easy access to information saved in the visual 

nodes. For a file, in this example, the filenames of the index. Ordering of keyframe image node files is done 

keyframes represent associated time information in respec- depending on the rendering of the hierarchical temporal 

five intervals of one thirtieth of a second. structure. 

Additional more descriptive information from an associ- Depending on the user interface, the nodes of keyframes 

ated visual index may also be included in the header file, as 4S are ordered in a selected structure and saved. Several dif- 

is done in the present invention. Information in this file is krent ordering structures are possible, as shown in FIGS, 

presented in attribute-value pairs at three levels: tape, node *A-E. Specifically, FIG. 6A illustrates a hierarchical top- 

and frame. The attribute-value pairs of the present structure down ordering, FIG. 6B illustrates a left-right ordering and 

gives freedom for inserting new attributes, for example, FIG- <*C illustrates a level ordering. FIG. 6D illustrates a 

levels for classification of the tapes or objects within a 50 * evel ordering which eliminates redundant storing of same 

f rame frames. Specifically, as previously mentioned, keyframe 1, 

Similarly, the present invention may be used for providing U > and 111 ma y represent the same image and thus, storage 

and/or retrieving multimedia documents or hypermedia of aU three 15 redundant. Thus, only keyframe 1, for 

documents such as a web page. A user may have specific example, is stored. 

interests, allowing a user profile or user preference infor- 55 FIG - 6E iUusu-ates an ordering for a multimedia document 

mation to be created by a server who may then package waich eliminates links to other documents, text, audio, or 

information dynamically. For example, as shown in FIG. 2B, video, in which a user has indicated disinterest, to provide a 

a document (Document) ia may contain audio, video user file. FIG. 6E provides an example of ordering for the 

(images), text and/or links to further documents (Docl, example described in FIG. 2B. 

Doc2, Doc3, etc.) 11 A. A user may only have interest in eo 1° 411 orderings, a node header, if used, may include such 

information contained in some of the audio, video, text or information as ID, number of key frames for the specific 

further documents for example, Doc2 and Doc3, but not level, and for each key frame, ID, annotation, position, 

others such as, Docl. Each further document, Docl-Doc3, number of child nodes and frame signature, 

may include text, audio, and/or video and further links to Node images may also be included. For each keyframe, 
still further multimedia documents 111A. . 65 information such as ID and image data may be included. 

Tne hierarchy created does not necessarily represent how To retrieve the saved keyframes, the header file is read 

a user would wish to retrieve the information (audio, video, first, then the Level A first keyframes or blobs are read and 
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stored on a temporary device such as a disk or other The present invention may also be expanded to include 

computer-readable medium. To optimize retrieval of the video clips, audio (sound, speech, music, etc.), colors or 

visual index or multimedia document, the visual index or video characteristics, and/or annotation, text or data 

multimedia document is restored in different segments. After (manually or automatically added) in conjunction with or 

each segment is read, the information can be displayed to the 5 presented separately with the keyframes, 

user. Thus, a user does not have to wait for the entire visual Further, a master index could be stored for a collection of 

index or multimedia document to load to look at levels or v i deo tapes, files, etc. allowing a user to view the master 

areas of interest already loaded. A user may see the most i ndex which may include information as to where specific 

representative keyframes or blobs of interest and progress programs, segments, etc. are stored, 

toward more detail ^as the visual index or web page, respec- 10 ^ k frames could also be ^ d and consequently, 

txvely is being loaded. At the moment the keyframe image reo ized according to pre specified criteria such as user 

node or blob that for the user interface is read it is sent to ferences or various chjste ring methods, such as shown in 

a memory from where the images, etc. may be displayed to FIQS 4A ^ 4B This u st such ^ those 

the user. Finally, the other keyframe images or blobs are keyframes which arc indicated as having a higher priorit 

loaded in a prespecified order. 15 ^ store d first in the data structure of the index file to permit 

FIGS. 7 A and 7B illustrate example systems of the present earlier retrieval, 

invention. Specifically, in FIG. 7A, a storage 702 has a it will thus be seen that the objects set forth above among 

selected number of most representative keyframes as pro- ^ made , from ^ ^ description, are 

vided by a v.deo indexing system or other automatic or efficient , atta £ ed md since ce £ iia chal f ^ made 

manual means. The storage 702 provides the selected key- ^ ^ ^ d ^ from ^ 

frames to a first processor 704 which orders the keyframes md of ^ inventioilj it is mtended ^ ^ matter 

into a selected number of levels, each level including a in toe a5ove description or shown m tbe accom . 

predetermined number of the most representative keyframes ^ drawi ^ be int ted as iUustra ti ve ^ not 

and each subsequent level including a multiple number of limiting sense 

keyframes of the previous level. A second processor 705, 25 _ . " , , , t 

which may be a separate second processor 705 or a part of . 11 "f ^° to be ^jstood that the following claims are 

the first processor 704, creates at least one header file based " ltended to M ° f the genenc and specific features of 

• c * . i c £ the invention herein described and all statements of the 

on information about the most representative keyframes of ^7 , , ' , 

the video scope of the mvention which, as a matter of language, might 

* . 30 be said to fall therebetween. 

The header file and keyframes are embodied in an index wl„* „\^: m „A .v. 

m • ... . , What is claimed is: 

file in a memory 706 which may be a separate memory or l. A method of forming a user index of scenes in a video 

part of the storage 702. A unit 708 which may be a separate { which ^ fCCOrded 0f bd recorded ^ a c ter _ 

device such as a computer, VCR or television and may have readabb mMum> said met hod comprising the steps of: 

a user-interface, then retneves the index file and presents the . . , t , , 

keyframes for each level, as each level is retrieved. 3S ^tneving corded video image a selected 

„. t iL . . „. „ „ . number of keyframes therein, each keyframe being a 

Similarly, the example system m FIG. 7B has a storage , J . t - ^ • ^ 

- 1A ,. L r i * • frame representative of a respective scene in said 

710 which may be, for example, present in a server. A first . r r 

processor 712 would order blobs into a selected number of ' 

levels. Each level would include at least one blob of text, „ n ordenn § me keyframes m accordance with user prefer- 

video, audio and links to other multimedia or hypertext 40 ence ^formation mto a hierarchy of a selected number 

documents. Each subsequent level would include at least of levels of detai1 in the represented thereby, 

one blob of text, video, audio and further links for each of each level including * predetermined number of 

the other multimedia or hypertext documents. keyframes, each subsequent level including keyframes 

A , , , . of greater detail than those in a preceding level; and 

A second processor 713, which may be a separate pro- 45 
cessor or part of the first processor 712, would organize storing the ordered keyframes in said computer-readable 
blobs into a user file based on user preference information. medium to thereby form said user index of scenes in 
The second processor 713 would be able to analyze a blob said ima S e ordered in accordance with the user pref- 
or link against a database or based on embedded erence information, thus reducing the time for access- 
information, to determine if the blob or link falls within a 50 ^ | he ordered hierarchy for scenes according to the 
user's prespecified area of interest. The second processor user's preferences. 

then organizes blobs and links based on this analysis to 2 - A method of forming a user index as claimed in claim 

present those blobs and finks at the top of a user's prespeci- l > further comprising the steps of: 

fied areas of interest first, such as was shown in FIG. 6E. creating at least one header file which is descriptive of 

A memory 714, which may be a separate storage or part 55 said keyframes; and 

of the storage 710 would store me organized blobs and links storing the at least one header file with the ordered 

embodied in the user file. A unit 716, such as a computer, keyframes, so that said header file is included in the 

would retrieve the user file and present the blobs and links, user index. 

as each is retrieved. 3* A system for forming a user index of scenes in a video 
As can now be readily appreciated, the invention allows 60 which is recorded or being recorded in a computer- 
storage of keyframes or blobs so as to optimize retrieval readable memory, said system comprising: 
from a relatively slow memory device. The invention may means for retrieving from said memory a selected number 
be included in any of the subsystems or may be a separate of keyframes of the video image, each keyframe being 
subsystem. One skilled in the art may easily use differing a frame representative of a respective scene in said 
numbers of nodes, keyframes, blobs, headers, node headers 65 image; 

and node images. Additional modifications may easily be a first processor for ordering the keyframes in a hierarchy 

made by one skilled in the art. in accordance with user preference information into a 
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selected Dumber of levels of detail in the scenes rep- 
resented thereby, each level including a predetermined 
number of keyframes, each subsequent level including 
keyframes of greater detail than those in a preceding 
level; 5 

a second processor for creating at least one header file 
which is descriptive of said keyframes, and storing said 
at least one header file with the ordered keyframes so 
as to form said user index in said memory; and 

means for retrieving said user index and displaying the 10 
keyframes therein for each level as such level is 
retrieved from said memory ordered in accordance with 
the user preference information, thus reducing the time 
for accessing the user index for scenes according to the 
user's preferences. 

4. A system as claimed in claim 3, wherein said first 
processor and said second processor are parts of a main 
processor. 

5. A method for forming a user file of binary large objects 
("blobs") in a set of multimedia documents recorded in a 20 
computer-readable medium, so as to optimize retrieval of 
blobs in accordance with user preference information; said 
method comprising the steps of: 

creating a preference file based on the user preference ^ 
information; 

retrieving blobs from the multimedia documents and 
ordering them into a selected number of levels in 
accordance with said preference file, each level includ- 
ing at least one blob from at least one of said documents 30 
and at least one link to another of said documents, each 
subsequent level including blobs from further multi- 
media documents; and 

storing particular blobs and links in said computer-related 
medium so as to form a user file for retrieval thereof 



from said set of multimedia documents in accordance 
with said preference file, thereby reducing the time for 
accessing the user file for blobs according to the user's 
preferences. 

6. A system for forming a user file of binary large objects 
("blobs") in a set of multimedia documents recorded in a 
computer-readable medium, so as to optimize retrieval of 
blobs in accordance with user preference information; com- 
prising: 

a first processor for retrieving blobs from the multimedia 
documents and ordering them into a selected number of 
levels, each level including at least one blob from at 
least one of said documents and at least one link to 
another of said documents, each subsequent level 
including blobs from further multimedia documents; 

a second processor for organizing the blobs and links from 
the first processor into a user file ordered in accordance 
with the user preference information; 

a memory for storing the user file; and 

means for retrieving blobs and links from the stored user 
file and displaying them to the user as each is retrieved, 
whereby the time for the user to access the user file for 
blobs and links according to the user's preferences is 
reduced. 

7. A system as claimed in claim 6, wherein said first 
processor and said second processor are part of a main 
processor. 

8. A system as claimed in claim 6, wherein said memory 
is part of said computer-readable medium. 

9. A system as claimed in claim 6, wherein said second 
processor analyzes each blob and link and determines if the 
respective blob or link is within an area of preference to the 
user. 
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